A. The Agentic Shift

From “user-in-the-loop” to “user-on-the-loop”

Agenda

  • A. The Agentic Shift — Why agents matter (and when they don’t)
  • B. The Continuum — Five levels of AI system autonomy
  • C. The Decision Framework — When NOT to use an agent
  • D. ReAct Theory — Thought -> Action -> Observation
  • E. Native vs Text — Two approaches to tool calling
  • F. Wrap-up — Key takeaways & lab preview

What Is an “Agent”?

An agent is an LLM that operates in a loop with access to tools, memory, and the ability to decide its own next step.

The Core Insight

An agent is not magic. It is a while loop with state and reasoning. The LLM is the brain — everything else is engineering.

The Hype vs Reality

The Promise

  • Autonomous task completion
  • Multi-step reasoning
  • Tool orchestration
  • Self-correction

The Reality

  • Unpredictable costs ($0.02 or $2.00?)
  • Infinite loops
  • Hallucinated tool calls
  • Debugging nightmares

Your first job as an AI engineer: Be a discerning architect. Not every problem needs an agent.

B. The Continuum

Five levels from static to fully autonomous

The AI System Spectrum

graph LR
    A["Static LLM<br/>Call"]
    A --> C["Tool<br/>Calling"]
    C --> D["Single Agent<br/>+ Planning"]
    D --> E["Multi-Agent<br/>System"]

    style A fill:#00C9A7,stroke:#1C355E,color:#1C355E

    style C fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style D fill:#FF7A5C,stroke:#1C355E,color:#1C355E
    style E fill:#FF7A5C,stroke:#1C355E,color:#1C355E

Each step to the right adds: more autonomy, more cost, more complexity, less predictability.

Level 1: Static LLM Call

A single prompt -> a single response. No memory, no tools, no loops.

response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Classify sentiment: 'Great product!'"}]
)
# -> "Positive"
  • Latency: ~1 second
  • Cost: ~$0.01
  • Use when: Tasks are deterministic and context-independent

Level 3: Tool Calling

The LLM decides which function to call and with what arguments.

sequenceDiagram
    participant U as User
    participant L as LLM
    participant T as Tool

    U->>L: "What's the weather in Riyadh?"
    L->>T: get_weather(location="Riyadh")
    T-->>L: {"temp": 38, "condition": "sunny"}
    L->>U: "It's 38C and sunny in Riyadh."

Not Yet an Agent

Tool calling is a controlled extension — the control flow is still linear. One request, one tool call, one answer.

Level 4: Single Agent with Planning

The agent plans a sequence of steps, executes them with tools, and adapts based on results.

# The agent autonomously decides:
# Step 1: Search "transformer architectures 2023"
# Step 2: Read top 3 results
# Step 3: Extract key contributions
# Step 4: Synthesize into report
# (Each step is an LLM call + tool execution)
  • Latency: 5-30 seconds (multiple LLM calls)
  • Cost: $0.10-$2.00 per query
  • Must implement timeouts and max-step limits

Level 5: Multi-Agent System

Specialized agents collaborate — Researcher, Analyst, Writer — coordinated by an Orchestrator.

graph TB
    O[Orchestrator] --> R[Researcher]
    O --> A[Analyst]
    O --> W[Writer]
    R -->|findings| O
    A -->|analysis| O
    W -->|draft| O
    A -.->|review| W

    style O fill:#1C355E,stroke:#00C9A7,color:white
    style R fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style A fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style W fill:#FF7A5C,stroke:#1C355E,color:#1C355E

  • Most complex to debug and monitor
  • Justify only for multi-faceted tasks

The Spectrum at a Glance

Level Architecture Latency Cost When to Use
1 Static LLM ~1s $0.01 Classification, extraction
2 Tool Calling ~3s $0.05 Single action + response
3 Single Agent 5-30s $0.10-2 Multi-step research
4 Multi-Agent 10-60s $0.50-5 Complex, multi-faceted tasks

Career Insight: Demonstrating you understand this continuum and can justify architectural choices is more impressive than saying “I built an agent.”

C. The Decision Framework

When NOT to use an agent

The Three Tradeoffs

Every step up the continuum increases three costs:

Reliability

Each LLM call has a chance of failure. More calls = more failure points.

Single call: 95% reliable 5-step agent: ~77% reliable

Cost

Agents multiply token usage. A 10-step agent uses 10x the tokens of a single call.

Budget per query matters.

Latency

Each planning step adds 1-3 seconds. Users notice after 5 seconds.

Multi-agent can hit 30-60s.

The Decision Tree

graph LR
    Q1{"Does the task need<br/>multiple steps?"}
    Q2{"Does it need<br/>external data or actions?"}
    Q3{"Can steps be<br/>pre-determined?"}
    Q4{"Do you need independent<br/>quality checks?"}


    A2["Tool Calling"]
    A4["Single Agent"]
    A5["Multi-Agent"]


    Q1 -->|Yes| Q2
    Q2 -->|Yes| Q3
    Q3 -->|Yes| A2
    Q3 -->|"No — needs planning"| Q4
    Q4 -->|No| A4
    Q4 -->|Yes| A5

    style A2 fill:#00C9A7,stroke:#1C355E,color:#1C355E

    style A4 fill:#FF7A5C,stroke:#1C355E,color:#1C355E
    style A5 fill:#FF7A5C,stroke:#1C355E,color:#1C355E

Real-World Examples

Task Right Architecture Why
“Summarize this email” Static LLM One pass, no tools needed
“Book a meeting at 2pm” Tool Calling One tool, predictable flow
“Research X and write a report” Single Agent Multi-step, needs planning
“Compare EU vs US policy, with quality review” Multi-Agent Independent research + review gate

The Golden Rule

Start with the simplest architecture that works. Only upgrade when you hit a specific bottleneck: context overflow, quality degradation, or throughput limits.

D. ReAct Theory

Thought -> Action -> Observation

What Is ReAct?

ReAct (Reasoning + Acting) was introduced by Yao et al. (2022). It is the most widely adopted agent loop pattern.

The core idea: instead of calling tools blindly, the agent explicitly reasons about what it observes and what to do next at every step.

The ReAct Loop

graph LR
    T["THOUGHT<br/>Agent reasons about<br/>current state"] --> A["ACTION<br/>Agent calls a tool<br/>with arguments"]
    A --> O["OBSERVATION<br/>System returns<br/>tool result"]
    O --> T
    O --> F["FINAL ANSWER<br/>Agent has enough<br/>info to respond"]

    style T fill:#1C355E,stroke:#00C9A7,color:white
    style A fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style O fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style F fill:#FF7A5C,stroke:#1C355E,color:#1C355E

Each iteration is one “step” in the agent’s execution.

ReAct in Action — Example

Query: “What is the population of the capital of France?”

Step Phase Content
1 Thought I need to find the capital of France first, then look up its population.
1 Action search("capital of France")
1 Observation “The capital of France is Paris.”
2 Thought Now I know it’s Paris. I need to find the population of Paris.
2 Action search("population of Paris")
2 Observation “The population of Paris is approximately 2.1 million.”
3 Final Answer “The population of the capital of France (Paris) is approximately 2.1 million.”

Why Not Just Call Tools Directly?

Without explicit reasoning, agents fail in predictable ways:

Without ReAct

  • Skips steps, jumps to conclusions
  • Calls irrelevant tools
  • Cannot explain its decisions
  • Debugging is impossible

With ReAct

  • Plans before acting
  • Selects tools based on reasoning
  • Full audit trail of “why”
  • Every step is traceable

The Thought step is not overhead — it is the entire point. It makes agents debuggable.

The Agent Core Components

Every agent has three building blocks:

graph TB
    subgraph Agent["Agent System"]
        M["MEMORY<br/>Conversation history<br/>+ past actions"]
        P["PLANNING<br/>ReAct reasoning<br/>+ step decomposition"]
        A["ACTION<br/>Tool execution<br/>+ result handling"]
    end

    M --> P
    P --> A
    A --> M

    style M fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style P fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style A fill:#FF7A5C,stroke:#1C355E,color:#1C355E

  • Memory: Persists context across turns (conversation buffer, summaries, vector memory)
  • Planning: Reasons about what to do next (ReAct, Plan-and-Execute, CoT)
  • Action: Executes tools and returns results safely

E. Native vs Text Calling

Two approaches — start with text, ship with native

Approach 1: Text-Based ReAct

The agent outputs plain text that your code parses for tool calls.

REACT_PROMPT = """Answer the question using tools. Format EXACTLY:

Thought: <your reasoning>
Action: tool_name(arg1="value1", arg2="value2")

When ready, respond with:
Thought: I have enough information.
Final Answer: <your answer>
"""

# You must manually parse "Action: search(query='...')" from the response

The Pain

Parsing free-text output is fragile. The model might say Action: search query="..." or Action: search("...") — every variation breaks your parser.

Approach 2: Native Function Calling

The API returns a structured JSON tool call — no parsing needed.

response = completion(
    model="gpt-4o",
    messages=messages,
    tools=[{                              # Schema tells the model what's available
        "type": "function",
        "function": {
            "name": "search",
            "parameters": {"type": "object", "properties": {...}}
        }
    }],
    tool_choice="auto",                   # Model decides when to call
)

# response.choices[0].message.tool_calls -> structured, guaranteed JSON

Text vs Native Comparison

Aspect Text-Based Native (API)
Parsing Manual regex/string parsing Structured JSON from API
Reliability Fragile — model may deviate Robust — guaranteed format
Debugging See the raw reasoning text See structured tool calls
Learning Value Understand the mechanics Production-grade
Supported Models Any LLM OpenAI, Anthropic, Gemini

Our Approach

In the lab, you will build a text-based agent first (to understand the mechanics), then examine a native agent (for production robustness).

The while Loop — Demystified

At its core, every agent is this pattern:

def run(query: str, max_steps: int = 10) -> str:
    messages = [system_prompt, user_query]

    for step in range(max_steps):        # The loop
        response = llm(messages)          # Ask the LLM

        if response.has_tool_calls():     # Does it want to act?
            results = execute_tools(response.tool_calls)
            messages.append(results)       # Feed results back

        elif response.has_content():      # Does it have an answer?
            return response.content       # Done!

    return "Max steps reached"            # Safety limit

That’s it. Everything else — memory, tracing, parallelism — is built on top of this loop.

F. Wrap-up

Key Takeaways

  1. Agents are while loops with state and reasoning — not magic
  2. The Continuum tells you which architecture fits your problem
  3. Start simple — upgrade only when you hit a specific bottleneck
  4. ReAct (Thought -> Action -> Observation) makes agents debuggable
  5. Native calling is more robust, but text-based teaches the fundamentals

Lab Preview: Building the Brain

Part 1: The “Raw” Agent

  • Build a ReAct loop from scratch
  • Manually parse Thought/Action/Observation
  • Feel the pain of text parsing

Part 2: Code Walkthrough

  • Examine ReactAgent in the project
  • Native function calling with LiteLLM
  • Compare robustness vs your notebook

Time: 75 minutes

Questions?

Session 1 Complete